Back

Annals of Internal Medicine

American College of Physicians

Preprints posted in the last 30 days, ranked by how well they match Annals of Internal Medicine's content profile, based on 27 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit.

1
Assessing the Impact of Timing and Coverage of United States COVID-19 Vaccination Campaigns: A Multi-Model Approach

Nande, A.; Larsen, S. L.; Turtle, J.; Davis, J. T.; Bandekar, S. R.; Lewis, B.; Chen, S.; Contamin, L.; Jung, S.-m.; Howerton, E.; Shea, K.; Bay, C.; Ben-Nun, M.; Bi, K.; Bouchnita, A.; Chen, J.; Chinazzi, M.; Fox, S. J.; Hill, A. L.; Hochheiser, H.; Lemaitre, J. C.; Loo, S. L.; Marathe, M.; Meyers, L. A.; Pearson, C. A. B.; Porebski, P.; Przykucki, E.; Smith, C. P.; Venkatramanan, S.; Vespignani, A.; Willard, T. C.; Yan, K.; Viboud, C.; Lessler, J.; Truelove, S.

2026-04-08 public and global health 10.64898/2026.04.07.26349269 medRxiv
Top 0.1%
14.7%
Show abstract

Background Six years after its emergence, SARS-CoV-2 continues to have a substantial burden. The impact of vaccination and the optimal timing of its rollout remain uncertain given existing population immunity and variability in outbreak timing between summer and winter. Methods The US Scenario Modeling Hub convened its 19th round of ensemble projections for COVID-19 hospitalizations and deaths in the United States, where eight teams projected trajectories in each US state and nationally from April 2025 to April 2026 under five scenarios regarding vaccine recommendations and timing. Recommendations had two eligibility scenarios (high-risk individuals only and all-eligible) and two timing scenarios (classic start: mid-August, earlier start: late June). These were crossed to create four scenarios and were compared against a counterfactual scenario with no vaccination. Findings Compared to no vaccination, our ensemble projections estimated 90,000 (95% PI 53,000-126,000) hospitalizations averted in the high-risk and classic timing scenario across the US. Expanding to all-eligible age-groups averted an additional 26,000 (95% PI 14,000-39,000) hospitalizations, which when coupled with the early vaccination timing, was projected to further reduce national hospitalizations by 15,000 (95% PI -3,000-33,000). The majority of teams projected both summer and winter waves. Implications We project COVID-19 will cause significant hospitalizations and deaths in the US in the 2025-26 season and estimate significant benefits from a broad all-eligible vaccination recommendation. The results also suggest an additional benefit is likely to be gained from an earlier vaccination campaign. Funding Centers for Disease Control and Prevention; National Institute of Health (US), National Science Foundation (US)

2
Comparative effectiveness of mRNA-1273 versus protein-based NVX-CoV2705 vaccination on COVID-19-related outcomes among US insured adults during 2024--2025: a retrospective matched cohort study

Wilson, A.; Beck, E.; Hensler, H.; Vicic, N.; Joshi, K.; Patry, E.; Li, L.; Wang, J.; Clarke, C.

2026-04-04 infectious diseases 10.64898/2026.04.02.26350067 medRxiv
Top 0.1%
4.8%
Show abstract

Background: COVID vaccination with periodically updated compositions remains important as SARS-CoV-2 continues to circulate, cause disease, and evolve. Available COVID-19 vaccines in the 2024-2025 season differed by platform, including mRNA-1273, an mRNA-based vaccine, and NVX-CoV2705, a recombinant protein-based vaccine and antigen composition (KP.2-targeted and JN.1-targeted, respectively). There is limited head-to-head real-world evidence comparing the effectiveness of these different approaches to prevention of severe outcomes with COVID-19. We compared mRNA-1273 with protein-based NVX-CoV2705 in insured US adults vaccinated during the 2024-2025 season. Methods: We conducted a retrospective matched cohort study in a large US claims database. Adults aged 18 years or older who received mRNA-1273 or NVX-CoV2705 between Aug 31, 2024 and Feb 28, 2025 were eligible. Recipients were exactly matched 2:1 on key demographic and clinical factors and then weighted with stabilized inverse probability of treatment weights. Outcomes were medically-attended COVID-19 and hospitalization with COVID-19 from day 7 after vaccination through up to 180 days of follow-up. We calculated comparative vaccine effectiveness (cVE) as 100 x (1-- hazard ratio). Results: Of 858,138 eligible mRNA-1273 recipients and 34,667 eligible NVX-CoV2705 recipients, 69,140 and 34,570, respectively, entered the matched cohort. Median (Q1, Q3) follow-up was 180 (163, 180) days for mRNA-1273 and 180 (162,180) for NVX-CoV2705. Medically attended COVID-19 occurred in 706 (1.02%) mRNA-1273 recipients and 512 (1.48%) NVX-CoV2705 recipients; adjusted cVE (95% CI) was 31.7% (23.4%, 39.1%). Hospitalization with COVID-19 occurred in 61 (0.09%) and 49 (0.14%) recipients, respectively; adjusted cVE (95% CI) was 40.7% (13.5%, 59.4%). In the 47,754 mRNA-1273 recipients matched to 23,877 NVX-CoV2705 recipients aged [≥]65, adjusted cVE (95% CI) was 25.7% (15.4%, 34.8%) against medically-attended COVID-19 and 41.7% (14.3%, 60.4%) against hospitalization with COVID-19. Conclusions: In this insured US adult population, mRNA-1273 demonstrated greater effectiveness against medically attended COVID-19 and hospitalization with COVID-19 than the protein-based NVX-CoV2705. These findings highlight the potential public-health importance of considering vaccine platform and variant selection when planning for upcoming seasons.

3
A protocol for assessment of interventions using a computational phenotype for Long COVID

Amitabh Gunjan, A.; Huang, L.; Appe, A.; McKelvey, P. A.; Algren, H. A.; Berry, M.; Mozaffari, E.; Wright, B. J.; Hadlock, J. J.; Goldman, J. D.

2026-03-27 infectious diseases 10.64898/2026.03.26.26347671 medRxiv
Top 0.1%
4.4%
Show abstract

Background: Long COVID presents with one or multiple symptoms or diagnosable conditions after SARS-CoV-2 infection. To study whether use of the antiviral remdesivir in persons hospitalized with acute COVID-19 is associated with reduced Long COVID, we created a computational phenotype for Long COVID. Methods: In electronic health records (EHR) from a multistate healthcare system (US), hospital admissions from 5/1/20 - 9/30/22 were reviewed. The study group was hospitalized with acute COVID-19 and the control group was hospitalized for other reasons without prior SARS-CoV-2 infection. The populations were balanced with overlap weights based on a high-dimensional propensity score of pre-specified variables and the top 100 comorbidities differing between the groups. Hazard ratios (HR) were calculated for the combined primary outcome: U09.9 (Post-Covid Conditions) or any incident secondary outcome from 90 to 365 days after admission. Secondary outcomes included 27 individual incident diagnoses, corrected for multiplicity with Holm-Bonferroni. Results: Admissions included 45,540 with, and 409,186 without COVID-19 during the study period, evaluable for the primary outcome. After weighting, standardized difference was < 0.01 for all measured confounders including demographic and clinical features. In the COVID+ and non-COVID groups 38.0% and 29.3% met the combined primary outcome, respectively. Weighted HR (95%CI) for the primary outcome was 1.37 (1.35, 1.40), p < 0.0001. All secondary outcomes were associated with the COVID+ group, when adjusted for multiplicity. Incident diagnoses with strong associations (HR > 2) included thromboembolism, hair loss, diabetes mellitus, obesity, and hypoxia. Anosmia/dysgeusia was associated with COVID, but wide confidence intervals reflected few charted diagnoses. Conclusions: Manifestations of Long COVID at population scale are detectable as part of routine symptoms and clinical diagnoses in the EHR after admissions for COVID-19, compared with all other hospital admissions. This a prior computational phenotype for Long COVID will be used to assess whether remdesivir use is associated with decreased Long COVID.

4
Latent Class Analysis Identifies Pulmonary Function Trajectory Phenotypes in Lung Transplant Recipients with Chronic Allograft Dysfunction

Neely, M.; Wojdyla, D. M.; Hong, H.; Wang, P.; Anderson, M. R.; Arroyo, K.; Belperio, J.; Benvenuto, L.; Budev, M.; Combs, M.; Dhillon, G.; Hsu, J. Y.; Kalman, L.; Martinu, T.; McDyer, J.; Oyster, M.; Pandya, K.; Reynolds, J. M.; Rim, J. G.; Roe, D. W.; Shah, P. D.; Singer, J. P.; Singer, L.; Snyder, L. P.; Tsuang, W.; Weigt, S. S.; Christie, J. D.; Palmer, S. M.; Todd, J.

2026-04-23 transplantation 10.64898/2026.04.22.26351501 medRxiv
Top 0.1%
4.3%
Show abstract

Background: We aimed to identify data-driven FEV1 trajectory phenotypes post-chronic lung allograft dysfunction (CLAD), relate these phenotypes to patient factors and future graft loss, and develop a classification approach for prospective patients. Methods: We studied adult first lung recipients with probable CLAD from two prospective multicenter cohorts: CTOT-20 (n=206) and LTOG (n=1418). FEV1 trajectories over the first nine months post-CLAD were characterized using joint latent class mixed models, jointly modelling time-to-graft loss to account for informative censoring. Models were fit independently in both cohorts and also only among LTOG bilateral recipients. A classification and regression tree (CART) model was derived in LTOG bilateral recipients and applied to CTOT-20 bilateral recipients. Findings: Four distinct early FEV1 trajectory classes were identified in CTOT-20, with large differences in nine month graft loss (72.3%, 31.1%, 2.2%, 0%). In LTOG, similar trajectory patterns were reproduced, with an additional class demonstrating early post-CLAD FEV1 improvement. Among bilateral recipients, trajectory classes showed a clear risk gradient, including a high-risk class with 100% graft loss and a low-risk class with no early graft loss. A CART model incorporating clinical and spirometric variables demonstrated good discrimination in LTOG bilateral recipients (multiclass AUC 0.85) and consistent class assignment and trajectory patterns when applied to CTOT-20. Interpretation: We identified reproducible, clinically meaningful early post-CLAD FEV1 trajectory phenotypes with differential graft loss risk. These phenotypes and a pragmatic classification tool may support risk stratification, trial enrichment, and improved prognostication for patients and clinicians.

5
High-Throughput Observational Evidence Generation Using Linked Electronic Health Record and Claims Data

Gombar, S.; Shah, N.; Sanghavi, N.; Coyle, J.; Mukerji, A.; Chappelka, M.

2026-04-07 health informatics 10.64898/2026.04.07.26350300 medRxiv
Top 0.1%
4.0%
Show abstract

Background: The observational literature on comparative effectiveness is expanding rapidly but remains difficult to synthesize. Discordant findings often stem from structural differences in cohort definitions, inclusion criteria, and follow up windows, leaving stakeholders without a cohesive evidence base. Furthermore, studies typically focus on a narrow subset of outcomes, neglecting the broader needs of diverse healthcare stakeholders 1,2,3,4. Methods We developed a high throughput evidence generation workflow using linked EHR and administrative claims data. The cornerstone is a prespecified measurement architecture applied uniformly across clinical scenarios: six post index windows (acute to two year follow.up); 28 Elixhauser comorbidities; 14 healthcare resource utilization (HCRU) categories; 29 laboratory measures with 52 binary thresholds; and 42 adverse event categories. We generated unadjusted treatment comparisons across ~1,038 outcomes per scenario, including effect-measure modification (EMM) assessments across 130 baseline features. Results Across 40 clinical domains, the workflow produced approximately 32,982,552 outcome evaluations. An evaluation included a treatment comparison outcome population effect estimate with uncertainty bounds and supporting diagnostics. Approximately 5,000 narrative summaries underwent structured clinical and statistical quality control before dissemination. Conclusions Standardized, high throughput workflows can shift evidence generation away from fragmented studies toward comprehensive evidence packages. This shared evidence base supports precision medicine by making treatment effect heterogeneity visible across clinically meaningful subpopulations, reducing the need for redundant, stakeholder-specific studies.

6
Persistent Racial Inequities in Acute Kidney Injury Among U.S. Hospitalizations: A Nationwide Cohort Analysis

Tai, B.; Okonkwo, C.

2026-03-27 public and global health 10.64898/2026.03.24.26349246 medRxiv
Top 0.2%
3.5%
Show abstract

Background Acute kidney injury (AKI) is a major contributor to morbidity, mortality, and healthcare utilization among hospitalized adults. Long-standing racial and ethnic inequities in U.S. healthcare--including unequal access to care, neighborhood disadvantage, and other structural factors--are known to influence kidney health, yet national data describing how these inequities manifest in AKI remain limited. Methods We conducted a retrospective, cross-sectional analysis of the 2022 National Inpatient Sample. AKI was identified using ICD-10-CM codes N17.x, and race/ethnicity followed HCUP categories. Descriptive analyses compared characteristics across groups. Survey-weighted logistic regression estimated adjusted odds of developing AKI, in-hospital mortality among AKI patients, and dialysis use, adjusting for demographics, payer, and comorbidities. Age-specific predicted AKI probabilities were derived from the adjusted model. Results AKI prevalence ranged from 15% to 23% across racial and ethnic groups. After adjustment, Black (OR 1.34), Native American (OR 1.08), and Other patients (OR 1.07) had higher odds of AKI, whereas Asian/Pacific Islander (OR 0.94) and Hispanic (OR 0.98) had slightly lower or similar odds. Among AKI hospitalizations, mortality was modestly lower for Black and Hispanic patients relative to White patients and higher for Asian/Pacific Islander and Native American patients. All non-White groups had higher odds of dialysis use. Age-specific curves showed persistent risk differences across adulthood. Conclusions Substantial racial disparities in AKI incidence, mortality, and dialysis use persisted after adjustment, reflecting broader structural inequities. Addressing these gaps will require both targeted clinical strategies and policy interventions focused on upstream determinants.

7
2024/25 end-of-season KP.2 vaccine effectiveness against COVID-19 hospitalization in older adults: a test-negative study in Quebec, Canada

Carazo, S.; Skowronski, D. M.; Sauvageau, C.; Talbot, D.; Racine, E.; Brousseau, N. M.

2026-04-04 infectious diseases 10.64898/2026.04.02.26350050 medRxiv
Top 0.2%
3.5%
Show abstract

We evaluated 2024/25 KP.2 vaccine effectiveness (VE) against COVID-19 hospitalization among adults >60 years old eligible for publicly-funded vaccination during fall and/or spring campaigns in the province of Quebec, Canada. We included Quebec residents tested for COVID-19-compatible symptoms in an acute-care hospital between October 13, 2024 (epi-week 2024-42) and August 23, 2025 (2025-34), linking vaccine, hospital, chronic diseases and laboratory administrative records to assess VE through test-negative design. We compared the odds of being COVID-19 test-positive versus test-negative among vaccinated versus non-vaccinated participants, adjusting for sex, age, comorbidities, place of residence, and epidemiological week. Overall, 49,949 (43%) participants were vaccinated. Over an analysis period spanning up to ten months, including median time since vaccination of 16 weeks (interquartile range 9-24 weeks), VE was 34% overall, declining from 43% <8 weeks to negligible by the 32nd week post-vaccination. Findings confirm meaningful but short-lived COVID-19 vaccine protection against hospitalization in older adults.

8
Invasive cervical cancers after an HPV-negative test: insights from screening histories

Hassan, S. S.; Nordqvist-Kleppe, S.; Asinger, N.; Wang, J.; Dillner, J.; Arroyo Muhr, L. S.

2026-04-13 public and global health 10.64898/2026.04.11.26350679 medRxiv
Top 0.2%
3.2%
Show abstract

Human papillomavirus (HPV) testing is the primary method for cervical cancer screening, and a negative HPV test is associated with a very low subsequent risk of invasive cancer. Nevertheless, a small number of cervical cancers are diagnosed following an HPV-negative testing result, posing challenges within HPV-based screening pathways. Using nationwide Swedish registry data of HPV testing, we identified women diagnosed with invasive cervical cancer between 2019 and 2024 and reconstructed HPV testing histories from the National Cervical Screening Registry (NKCx). The most recent HPV test prior to diagnosis was defined as the index test, and longitudinal HPV testing trajectories were classified among women with an HPV-negative index test. Of 3,000 women diagnosed with invasive cancer, 243 (8.1%) had an HPV-negative index test. These women were older at diagnosis and more frequently diagnosed at advanced stages compared with women with an HPV-positive index test. Most HPV-negative index tests (66.3%) were performed in the peri-diagnostic period (+/- 30 days). Among women with an HPV-negative index test, 52.7% (128/243) had no prior HPV testing recorded, while the remainder had consistently HPV-negative histories (33.3%, 83/243) or evidence of prior HPV positivity before the index negative test (14%, 32/243). Possible recurrent HPV positivity following an intervening negative test was rare (0.4%, 1/243). HPV-negative screening results preceding invasive cancer reflect heterogeneous screening histories and cannot be explained solely by test failure. Findings highlighting the importance of reaching women earlier in screening programs and show that fluctuating HPV detectability is rare.

9
Effectiveness of 2025-2026 mRNA-1283 and BNT162b2 COVID-19 Vaccines Against COVID-19 Related Hospitalizations and Medically-Attended COVID-19 Among Adults Aged >= 65 years in the United States

Vicic, N.; Bogdanov, A.; Hensler, H.; Ryan, T.; Zeng, N.; Beck, E.; Patry, E.; Bonafede, M.; Araujo, A. B.; Wilson, A.

2026-04-16 infectious diseases 10.64898/2026.04.13.26350772 medRxiv
Top 0.2%
3.2%
Show abstract

Background: The 2025/2026 COVID-19 vaccine season introduced updated formulations targeting the LP.8.1 lineage. This study assessed the absolute vaccine effectiveness (aVE) of mRNA-1283 and BNT162b2 on COVID-19 outcomes in adults aged [&ge;]65 years. Methods: Background: The 2025/2026 COVID-19 vaccine season introduced updated formulations targeting the LP.8.1 lineage. This study assessed the absolute vaccine effectiveness (aVE) of mRNA-1283 and BNT162b2 on COVID-19 outcomes in adults aged [&ge;]65 years. Methods: This retrospective study used linked electronic health record and administrative claims data through Jan 31, 2026. Adults [&ge;]65 years who received the mRNA-1283 or BNT162b2 2025/2026 COVID-19 vaccine were matched to unvaccinated individuals. Inverse probability of treatment weighting was applied to matched cohorts of each vaccine to balance covariates. Each vaccine was evaluated independently against its own unvaccinated comparator group. aVE against COVID-19 related hospitalization and medically-attended COVID-19 was estimated using Cox proportional hazards models; aVE = 100 x (1 - hazard ratio [HR]). Results: We identified 233,072 mRNA-1283 recipients and 422,610 BNT162b2 recipients [&ge;]65 years. The aVE (95% confidence interval) of mRNA-1283 against COVID-19 related hospitalization and medically-attended COVID-19 was 59.3% (39.0%, 72.9%) and 42.0% (35.0%, 48.3%) among adults [&ge;]65 years and 66.9% (45.9%, 79.8%) and 50.2% (42.1%, 57.2%) in [&ge;]75 years, respectively. The aVE of BNT162b2 against COVID-19 related hospitalization and medically-attended COVID-19 was 48.3% (32.4%, 60.5%) and 41.2% (36.2%, 45.8%) in [&ge;]65 years and 45.9% (26.0%, 60.4%) and 44.0% (37.8%, 49.6%) in [&ge;]75 years, respectively. Conclusions: This is the first real-world evidence showing that mRNA-1283 prevents COVID-19-related hospitalizations and medically attended events in vulnerable older adults at highest risk of severe disease. These findings support mRNA-1283 as an important public health tool for reducing the ongoing burden of COVID-19.Results: We identified 233,072 mRNA-1283 recipients and 422,610 BNT162b2 recipients [&ge;]65 years. The aVE (95% confidence interval) of mRNA-1283 against COVID-19 related hospitalization and medically-attended COVID-19 was 59.3% (39.0%, 72.9%) and 42.0% (35.0%, 48.3%) among adults [&ge;]65 years and 66.9% (45.9 %, 79.8%) and 50.2% (42.1%, 57.2%) in [&ge;]75 years, respectively. The aVE of BNT162b2 against COVID-19 related hospitalization and medically-attended COVID-19 was 48.3% (32.4%, 60.5%) and 41.2% (36.2%, 45.8%) in [&ge;]65 years and 45.9% (26.0%, 60.4%) and 44.0% (37.8%, 49.6%) in [&ge;]75 years, respectively. Conclusions: This is the first real-world evidence showing that mRNA-1283 prevents COVID-19-related hospitalizations and medically attended events in vulnerable older adults at highest risk of severe disease. These findings support mRNA-1283 as an important public health tool for reducing the ongoing burden of COVID-19.

10
Democratizing Scientific Publishing: A Local, Multi-Agent LLM Framework for Objective Manuscript Editing

Bhansali, R.; Gorenshtein, A.; Westover, B.; Goldenholz, D. M.

2026-04-17 health informatics 10.64898/2026.04.13.26350761 medRxiv
Top 0.2%
2.5%
Show abstract

Manuscript preparation is a critical bottleneck in scientific publishing, yet existing AI writing tools require cloud transmission of sensitive content, creating data-confidentiality barriers for clinical researchers. We introduce the Paper Analysis Tool (PAT), a free, multi-agent framework that deploys 31 specialized agents powered by small language models (SLMs) to audit manuscripts across multiple quality dimensions without external data transmission. Applied to three published clinical neurological papers, PAT generated 540 evaluable suggestions. Validation by two expert reviewers (R.B., A.G.) confirmed 391 actionable, high-value revisions (90% agreement), achieving a 72.4% overall usefulness accuracy spanning methodological, statistical, and visual domains. Furthermore, deterministic re-evaluation of 126 agent-suggested rewrite pairs using Phase 0 metrics confirmed text improvement: total word count decreased by 25%, passive voice prevalence dropped sharply from 35% to 5%, average sentence length decreased by 24%, long-sentence fraction fell by 67%, and the Flesch-Kincaid grade improved by 17% . Our validation confirms that systematic, agent-driven pre-submission review drives measurable improvements, successfully converting manuscript optimization from an opaque, manual endeavor into a transparent and rigorous scientific process. Manuscript preparation is a critical bottleneck in scientific publishing, yet existing AI writing tools require cloud transmission of sensitive content, creating data-confidentiality barriers for clinical researchers. We introduce the Paper Analysis Tool (PAT), a free, multi-agent framework that deploys 31 specialized agents powered by small language models (SLMs) to audit manuscripts across multiple quality dimensions without external data transmission. Applied to three published clinical neurological papers, PAT generated 540 evaluable suggestions. Independent validation by two expert reviewers (R.B., A.G.) confirmed 391 actionable, high-value revisions (90% agreement), achieving a 72.4% overall usefulness accuracy spanning methodological, statistical, and visual domains. Furthermore, deterministic re-evaluation of 126 suggested Phase 0 rewrite pairs confirmed text improvement: total word count decreased by 25%, passive voice prevalence dropped sharply from 35% to 5%, average sentence length decreased by 24%, and long-sentence fraction fell by 67%, and the Flesch-Kincaid grade improved modestly. Our validation confirms that systematic, agent-driven pre-submission review drives measurable improvements, successfully converting manuscript optimization from an opaque, manual endeavor into a transparent and rigorous scientific process.

11
Implementation of Human-in-the-Loop ChatGPT-based Patient Screening Across Multiple Diverse Clinical Trials

Dohopolski, M.; Esselink, K.; Desai, N.; Grones, B.; Patel, T.; Jiang, S.; Peterson, E.; Navar, A. M.

2026-03-27 health informatics 10.64898/2026.03.20.26348890 medRxiv
Top 0.3%
2.3%
Show abstract

Purpose: Manual screening for trial eligibility is inefficient and costly. We prospectively evaluated a large language model (LLM)-assisted prescreening workflow across multiple active trials. Methods: We deployed a retrieval-augmented generation LLM-based pipeline across multiple trials at an academic medical center. Structured electronic health record data and free-text notes were used by the LLM to classify each criterion as either met, likely met, likely not met, not met, uncertain, or no documentation found, with accompanying rationale. Coordinators were provided a sorted patient list based on LLM-derived eligibility and reviewed each case, documenting their assessment of individual criteria and final prescreening status (success vs failure). Criterion-level performance--accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F1 score--was calculated and tracked over time. Patient prescreening status was also evaluated as a function of the percentage of individual AI criteria met (60--80% and [&ge;]80%). Results: From October 2024--September 2025, 39,182 patients were prescreened using the LLM workflow across 26 studies (21 oncology and 5 non-oncology), encompassing 112 distinct criteria. A total of 914 patients with high likelihood of eligibility underwent coordinator review (5,096 criteria evaluated). Aggregated criterion-level performance was as follows: accuracy 0.94 (95% CI, 0.92--0.96), sensitivity 0.98 (0.97--0.99), specificity 0.81 (0.71--0.88), PPV 0.95 (0.92--0.97), NPV 0.93 (0.90--0.95), and F1 score 0.97 (0.95--0.97). Twenty-seven criteria prompts across 14/26 trials were automatically updated based on coordinator feedback. Patients with [&ge;]80% of AI-labeled criteria classified as met or likely met were more likely to be reviewed by coordinators (544/987, 55.1% vs 372/397, 93.7%) and more likely to be labeled as prescreening successes (104/544, 19.1% vs 162/372, 43.5%) compared to those with 60--80%. The average cost was $0.12 per patient. Conclusion: An LLM-assisted, human-in-the-loop prescreening workflow demonstrated high criterion-level performance at low cost across a diverse set of actively enrolling clinical trials. Structured coordinator feedback enabled an automated learning system, improving screening efficiency while preserving necessary human oversight.

12
Missed Appointments and Associations with Clinical Outcomes in A Large National Healthcare System

Yin, Y.; Cheng, Y.; Ling, Y.; Ruser, C.; Altalib, H. H.; Masheb, R. M.; Kravetz, J.; Nelson, S. J.; Ahmed, A.; Faselis, C.; Brandt, C. A.; Zeng-Treitler, Q.

2026-03-30 health systems and quality improvement 10.64898/2026.03.28.26349531 medRxiv
Top 0.3%
2.1%
Show abstract

Importance Missed outpatient appointments, including no-shows and cancellations, may disrupt continuity of care and be associated with worse outcomes, but long-term system-wide patterns and clinical implications are not well characterized. Objective To characterize variation in missed appointment rates in the Veterans Health Administration (VHA) over time and by geographic location, visit modality, and preexisting conditions, and to evaluate associations between missed appointment rates and adverse outcomes among veterans with posttraumatic stress disorder (PTSD) or traumatic brain injury (TBI). Design Cohort study using VHA Corporate Data Warehouse outpatient appointment data from January 1, 2000, through December 31, 2024. Setting National integrated health care system of the VHA. Participants System analysis includes all scheduled outpatient appointments with a valid status, and outcome analysis includes veterans with PTSD (n = 1 429 890) or TBI (n = 554 553), diagnosed before 2023. Exposures For system -level analyses, missed appointment rates were calculated. In outcome analyses, 2023 missed appointment rates were categorized into tertiles within the cohort and appointment type. Main Outcomes and Measures One year risks of all-cause hospitalization, all-cause mortality, and hospitalization or death beginning January 1, 2024. Results Among 2,162,520,880 outpatient appointments from 2000 to 2024, 6.5% were no-shows and 25.4% were canceled. Across facilities, no-show rates ranged from 3.5% to 14.1%, patient-initiated cancellation rates from 9.7% to 26.0%, and clinic-initiated cancellation rates from 8.5% to 17.9%. In 2023, veterans with amputation, Parkinson disease, PTSD, or TBI had higher missed appointment rates than veterans without these conditions. Among veterans with PTSD, the highest no-show tertile, compared with none, was associated with higher mortality (HR, 1.91; 95% CI, 1.84-1.98) and hospitalization or death (HR, 1.07; 95% CI, 1.05-1.08). Among veterans with TBI, the highest no-show tertile was associated with hospitalization or death (HR, 1.65; 95% CI, 1.61-1.69). Conclusions and Relevance Missed outpatient appointments were common in the VHA and varied substantially across facilities and over time. Among veterans with PTSD or TBI, higher missed appointment rates, particularly no-shows, were associated with increased risks of hospitalization and mortality, suggesting that these patterns may help identify high-risk veterans for targeted outreach.

13
Citation Hallucination Determines Success: An Empirical Comparison of Six Medical AI Research Systems

Shi, X.; Tian, Z.; Tan, S.; Wang, X.

2026-04-04 health informatics 10.64898/2026.04.02.26350091 medRxiv
Top 0.3%
2.1%
Show abstract

Large language model (LLM) systems can now generate complete research manuscripts, yet their reliability in clinical medicine - where citation accuracy and reporting standards carry direct consequences - has not been systematically assessed. We introduce MedResearchBench, a benchmark of three clinical epidemiology tasks built on NHANES data, and use it to evaluate six AI research systems across six quality dimensions. Evaluation combines programmatic citation verification, rule-based reporting compliance checks, and multi-model LLM judging, providing a more discriminative assessment than conventional single-judge approaches. Citation integrity emerged as the decisive quality dimension. Hallucination rates ranged from 2.9% to 36.8% across systems, and a hard-rule threshold on per-task citation scores capped four of six systems' total scores at the penalty ceiling. Adding a multi-agent citation verification and repair pipeline to the best-performing system improved its citation integrity score from 40.0 to 90.9 and raised the weighted total from 68.9 to 81.8. Strikingly, a single-model evaluation ranked this system last (55.5), while our three-tier framework ranked it first (81.8) - a complete reversal that exposes the limitations of subjective LLM-only evaluation. These results suggest that programmatic citation verification should be a core metric in future evaluations of AI scientific writing systems, and that multi-agent quality assurance can bridge the gap between fluent text generation and trustworthy scholarship.

14
Demystifying Clone-Censor-Weight Method in Target Trial Emulation: A Real-World Study of HPV Vaccination Strategies

Lin, T.; Li, Y.; Huang, Z.; Gui, T. T.; Wang, W.; Guo, Y.

2026-04-22 health informatics 10.64898/2026.04.21.26351413 medRxiv
Top 0.3%
2.1%
Show abstract

Target trial emulation (TTE) offers a principled way to estimate treatment effects using real-world observational data, but analyses of time-varying treatment strategies remain vulnerable to immortal time bias. The clone-censor-weight (CCW) approach is increasingly used to address this problem, yet key aspects of its causal interpretation and implementation remain unclear. In this work, we emulate a target trial using electronic health records (EHRs) to compare completion of a 3-dose 9-valent human papillomavirus vaccination (HPV) series within 12 months versus remaining partially vaccinated among vaccine initiators. We link CCW to the classic potential outcome framework in causal inference, evaluate the role of different weighting mechanisms, and account for within-subject correlation induced by cloning using cluster-robust variance estimation. Our study provides practical guidance for applying CCW in real-world comparative effectiveness studies to address immortal time bias and supports more rigorous and interpretable treatment effect estimation in TTE.

15
Time to diagnosis among children and adolescents with cancer in Quebec, Canada: a population-based study

Mullen, C.; Barr, R. D.; Strumpf, E.; El-Zein, M.; Franco, E. L.; Malagon, T.

2026-04-13 epidemiology 10.64898/2026.04.09.26350491 medRxiv
Top 0.3%
2.1%
Show abstract

BackgroundTimely cancer diagnosis in children and adolescents is critical to improving outcomes, yet substantial variation in diagnostic intervals persists across cancer types and care settings. We aimed to quantify time to diagnosis and assess variations by patient, demographic, and system-level factors. MethodsWe conducted a retrospective population-based study of children and adolescents aged 0-19 years diagnosed with one of 12 common cancers between 2010 and 2022 in Quebec, Canada. The diagnostic interval was defined as the time from first cancer-related healthcare encounter to diagnosis. We calculated medians and interquartile ranges (IQR) overall and by cancer type and used multivariable quantile regression to identify factors associated with time to diagnosis at the 25th, 50th, and 75th percentiles. ResultsAmong 2,927 individuals with cancer, diagnostic intervals varied by cancer type and age. Median intervals were longest for carcinomas (100 days; IQR 33-192) and shortest for leukemias (8 days; IQR 3-44). Compared with children living in Montreal, living in regional areas and other large urban centres was associated with longer 50th and 75th percentiles of time to diagnosis for hepatic and central nervous system (CNS) tumours. Diagnostic intervals were shorter in the post-pandemic period (2020-2022) across several cancer sites, with CNS tumours showing reductions across all quantiles. InterpretationDiagnostic timeliness differed by cancer type, age, and rurality, but not by sex, material, or social deprivation. The shorter diagnostic intervals observed in the post-pandemic period suggest that pandemic-related changes in care pathways may have expedited diagnosis for some cancers.

16
Impact of Primary Graft Dysfunction on Neurodevelopmental Outcomes in Pediatric Heart Transplant Recipients

Monserrate-Marrero, J.; Castro-Medina, M.; Feingold, B.; Giraldo-Grueso, M.; Rose-Felker, K.; Tang, R.; Kobayashi, K.; Diaz-Castrillon, C. E.; McIntyre, K.; Da Silva, L.; Da Silva, J. P.; Morell, V.; Seese, L.

2026-04-02 transplantation 10.64898/2026.03.30.26349794 medRxiv
Top 0.3%
1.9%
Show abstract

Background: Primary graft dysfunction (PGD) remains one of the leading causes of early mortality after pediatric heart transplant (HT). While neurodevelopmental impacts of congenital heart disease (CHD) are well-characterized, the effect of PGD on long-term neurodevelopmental outcomes in pediatric HT recipients remains unknown. We sought to determine the association between PGD and neurodevelopmental outcomes in this population. Methods: We performed a retrospective cohort study using the United Network for Organ Sharing (UNOS) database. All pediatric (age <18 years) isolated heart transplant recipients from 2010-2025 were included. The most recent pre- and post-transplant neurodevelopmental outcomes including cognitive delay, motor development, academic progress, and function status (stratified by age) were compared between PGD (n=434) and non- PGD groups (n=6956). Results: PGD patients had significantly worse pre-transplant functional status and motor development. Post-transplant, PGD was associated with worse motor development (18.8% vs. 13.0% definite motor delay; p=0.01) and functional status in younger children (39.5% vs. 57.8% able to keep up with peers; p<0.001). Post-transplant stroke occurred 3.5 times more frequently in PGD patients (11.5% vs. 3.3%; p<0.001). Cognitive development (p=0.94) and academic progress (p=0.096) did not differ significantly. Thirty-day (7.8% vs. 1.9%) and 1-year mortality (20.3% vs. 6.4%) were significantly higher in PGD patients (both p<0.001). Conclusions: This is the first study to characterize neurodevelopmental outcomes in pediatric patients undergoing HT with PGD. PGD is associated with significantly worse motor development and functional status independent of pre-transplant baseline. There is a 3.5-fold higher stroke rate providing a plausible neurological mechanism. The findings support targeted developmental surveillance recommendations and early intervention for this high-risk population.

17
Influenza vaccine effectiveness against influenza-associated hospitalizations and emergency department or urgent care encounters among children and adults - United States, 2024-25 season

DeCuir, J.; Reeves, E. L.; Weber, Z. A.; Yang, D.-H.; Irving, S. A.; Tartof, S. Y.; Klein, N. P.; Grannis, S. J.; Ong, T. C.; Ball, S. W.; DeSilva, M. B.; Dascomb, K.; Naleway, A. L.; Koppolu, P.; Salas, S. B.; Sy, L. S.; Lewin, B.; Contreras, R.; Zerbo, O.; Hansen, J. R.; Block, L.; Jacobson, K. B.; Dixon, B. E.; Rogerson, C.; Duszynski, T.; Fadel, W. F.; Barron, M. A.; Mayer, D.; Chavez, C.; Yates, A.; Kirshner, L.; McEvoy, C. E.; Akinsete, O. O.; Essien, I. J.; Sheffield, T.; Bride, D.; Arndorfer, J.; Van Otterloo, J.; Natarajan, K.; Ray, C. S.; Payne, A. B.; Adams, K.; Flannery, B.; Garg,

2026-04-24 public and global health 10.64898/2026.04.22.26350853 medRxiv
Top 0.3%
1.9%
Show abstract

Background: The 2024-25 influenza season was the most severe in the United States (US) since 2017-18, with co-circulation of both influenza A virus subtypes (H1N1 and H3N2). Influenza vaccine effectiveness (VE) has varied by season, setting, and patient characteristics. Methods: Using electronic healthcare encounter data from eight US states, we evaluated influenza vaccine effectiveness (VE) against influenza-associated hospitalizations and emergency department or urgent care (ED/UC) encounters from October 2024-April 2025 among children aged 6 months-17 years and adults aged 18+ years. Using a test-negative, case-control design, we compared the odds of influenza vaccination between acute respiratory illness (ARI) encounters with a positive (cases) versus negative (controls) test for influenza by molecular assay, adjusting for confounders. Results: Analyses included 108,618 encounters (5,764 hospitalizations and 102,854 ED/UC encounters) among children and 309,483 encounters (76,072 hospitalizations and 233,411 ED/UC encounters) among adults. Among children across care settings, 17.0% (6,097/35,765) of cases versus 29.4% (21,449/72,853) of controls were vaccinated. Among adults, 28.2% (21,832/77,477) of cases versus 44.2% (102,560/232,006) of controls were vaccinated. VE was 51% (95% confidence interval [95% CI]: 41-60%) against influenza-associated hospitalizations and 54% (95% CI: 52-55%) against influenza-associated ED/UC encounters among children. VE was 43% (95% CI: 41-46%) against influenza-associated hospitalizations and 49% (95% CI: 47-50%) against influenza-associated ED/UC encounters among adults. Conclusions: Influenza vaccination provided protection against influenza-associated hospitalizations and ED/UC encounters among children and adults in the US during the severe 2024-25 influenza season. These findings support influenza vaccination as an important tool to reduce influenza-associated disease.

18
County-level decarceration atlas: mechanisms, prevalence, and dynamics of decarceration across 2,870 U.S. counties, 1999-2019

Liu, Y. E.; Li, B.; Warren, J. L.; Gonsalves, G. S.; Wang, E. A.

2026-04-04 public and global health 10.64898/2026.04.02.26349309 medRxiv
Top 0.4%
1.8%
Show abstract

Decarceration, the process of reducing incarceration rates, is increasingly viewed as a strategy to improve population health and reduce health inequities. Yet, evidence on its health effects remains limited and may depend on how decarceration occurs. We developed a national decarceration "atlas" to characterize the mechanisms and dynamics of decarceration across more than 2,800 U.S. counties between 1999-2019. Using longitudinal county-level jail and prison data, we identified four operational types of decarceration: reduced pretrial detention, reduced jail time, reduced prison admissions, and reduced prison time. Nearly two-thirds of counties, including most rural counties, experienced at least one decarceration type during the study period. Declines typically followed periods of recent growth and were relatively modest in magnitude, with median reductions of 19% to 38% ten years after onset. The frequency and timing of decarceration types varied by urbanicity, state, and region, with many counties experiencing multiple mechanisms concurrently. Validation against documented case studies of state and local decarceration demonstrated alignment with known legislative and de facto drivers, while revealing substantial sub-state heterogeneity. This atlas provides a scalable framework and hypothesis-generating resource to support comparative studies of decarceration's heterogeneous health effects.

19
Multi-Task Learning and Soft-Label Supervision for Psychosocial Burden Profiling in Cancer Peer-Support Text

Wang, Z.; Cao, Y.; Shen, X.; Ding, Z.; Liu, Y.; Zhang, Y.

2026-04-04 health informatics 10.64898/2026.04.03.26350034 medRxiv
Top 0.4%
1.7%
Show abstract

Objective: Online cancer peer-support text contains signals of psychosocial burden beyond emotional tone, including treatment burden, financial strain, uncertainty, and unmet support needs. We evaluated 2 modeling extensions: multi-task learning (MTL) for joint prediction of health economics and outcomes research (HEOR) burden dimensions, and soft-label supervision using large language model (LLM)-derived probability distributions. Materials and Methods: We analyzed 10,392 cancer peer-support posts. GPT-4o-mini generated proxy annotations for HEOR burden subscales, composite burden, high-need status, speaker role, cancer type, and emotion probabilities. Study 1 trained a shared ALBERT encoder under 4 MTL conditions: composite and subscale burden targets, each with and without auxiliary heads, using Kendall uncertainty weighting. Study 2 compared soft-label training on LLM emotion distributions with hard-label baselines under regular and token-augmented inputs, evaluating performance against both human labels and AI distributions. Results: Composite-only MTL achieved R2=0.446 for burden regression and weighted F1=0.810 for high-need screening; subscale classification achieved mean weighted F1=0.646. Adding auxiliary role and cancer-type heads reduced regression performance ({triangleup}R2 = -0.209). Soft-label training reduced weighted F1 by 0.16 versus hard-label baselines (0.68 vs. 0.86), and token augmentation did not improve performance under soft supervision. Discussion: Composite-only MTL supported modeling of multidimensional burden-related signals from forum text, whereas auxiliary prediction heads appeared to compete with primary tasks. Soft-label training aligned poorly with human-labeled emotion categories, suggesting that uncalibrated LLM distributions may propagate bias rather than improve supervision. Conclusion: Composite-only MTL was the strongest burden-modeling approach, and hard-label supervision remained preferable for emotion classification.

20
Influenza vaccine effectiveness against influenza A-associated hospitalization and severe in-hospital outcomes among adults in the United States, 2024-2025

Lewis, N. M.; Cleary, S.; Harker, E. J.; Safdar, B.; Ginde, A. A.; Peltan, I. D.; Gaglani, M.; Columbus, C.; Martin, E. T.; Lauring, A. S.; Steingrub, J. S.; Hager, D. N.; Mohamed, A.; Johnson, N. J.; Khan, A.; Duggal, A.; Wilson, J. G.; Qadir, N.; Busse, L. W.; Kwon, J. H.; Exline, M. C.; Vaughn, I. A.; Mosier, J. M.; Harris, E. S.; Zhu, Y.; Grijalva, C. G.; Halasa, N. B.; Chappell, J.; Surie, D.; Dawood, F. S.; Ellington, S. R.; Self, W. H.

2026-04-02 infectious diseases 10.64898/2026.03.31.26349873 medRxiv
Top 0.5%
1.4%
Show abstract

Background: The U.S. 2024-2025 influenza season was characterized by sustained elevated activity from November 2024 to April 2025, with circulation of both influenza A(H1N1)pdm09 and A(H3N2), the latter of which included some antigenically drifted viruses. Methods: From October 1, 2024, to April 30, 2025, a multistate respiratory virus surveillance network enrolled adults hospitalized with acute respiratory illness in 26 U.S. medical centers. Influenza vaccine effectiveness (VE) against influenza-associated hospitalization and severe in-hospital outcomes was estimated using a test-negative study. The odds of influenza vaccination among influenza-positive case patients and influenza-negative control patients were compared using multivariable logistic regression; VE was calculated as (1-adjusted odds ratio for vaccination) x 100, expressed as a percent. Results: The 2024-2025 seasonal influenza vaccine was effective against influenza-associated hospitalization (VE: 40% [95% confidence interval (CI): 32%-47%]), consistent across age group and influenza A subtypes. Influenza vaccination also reduced the overall risk of all severe in-hospital outcomes evaluated, including standard oxygen therapy (VE: 41% [95% CI: 31%-50%]), non-invasive advanced respiratory support (VE: 38% [95% CI: 19%-52%]), invasive organ support (VE: 58% [95% CI: 44%-69%]), ICU admission (VE: 58% [95% CI: 47%-67%]), and death (VE: 52% [95% CI: 18%-71%]) with effectiveness varying by influenza A subtype and age. Conclusions: Influenza vaccination reduced the risk of influenza-related hospitalization and severe in-hospital outcomes in adults during the severe 2024-2025 influenza season compared to those not vaccinated.